Goto

Collaborating Authors

 conservative estimate




Reviews: Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction

Neural Information Processing Systems

Summary: This paper proposes a new algorithm that help stabilize off-policy Q-learning. The idea is to introduce approximate Bellman updates that are based on constraint actions sampled only from the support of the training data distribution. The paper shows the main source of instability is the boostrapping error. The boostrapping process might use actions that do not lie in the training data distribution. This work shows a way to mitigate this issue.


Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark

arXiv.org Artificial Intelligence

The GLUE benchmark (Wang et al., 2019b) is a suite of language understanding tasks which has seen dramatic progress in the past year, with average performance moving from 70.0 at launch to 83.9, state of the art at the time of writing (May 24, 2019). Here, we measure human performance on the benchmark, in order to learn whether significant headroom remains for further progress. We provide a conservative estimate of human performance on the benchmark through crowdsourcing: Our annotators are non-experts who must learn each task from a brief set of instructions and 20 examples. In spite of limited training, these annotators robustly outperform the state of the art on six of the nine GLUE tasks and achieve an average score of 87.1. Given the fast pace of progress however, the headroom we observe is quite limited. To reproduce the data-poor setting that our annotators must learn in, we also train the BERT model (Devlin et al., 2019) in limited-data regimes, and conclude that low-resource sentence classification remains a challenge for modern neural network approaches to text understanding.


Adaptive Design of Experiments for Conservative Estimation of Excursion Sets

arXiv.org Machine Learning

We consider a Gaussian process model trained on few evaluations of an expensive-to-evaluate deterministic function and we study the problem of estimating a fixed excursion set of this function. We review the concept of conservative estimates, recently introduced in this framework, and, in particular, we focus on estimates based on Vorob'ev quantiles. We present a method that sequentially selects new evaluations of the function in order to reduce the uncertainty on such estimates. The sequential strategies are first benchmarked on artificial test cases generated from Gaussian process realizations in two and five dimensions, and then applied to two reliability engineering test cases.